Cluster Based Under-Sampling for Unbalanced Cardiovascular Data

نویسندگان

  • M. Mostafizur Rahman
  • D. N. Davis
چکیده

Most medical datasets are not balanced in their class labels. Indeed in some cases it has been no ticed that the given class labels do not accurately represent characteristics of the data record. Most existing classification methods tend not to perform well on minority class examples when the dataset is extremely imbalanced. This is because they aim to optimize the overall accuracy without considering the relative distribution of each class. In this paper we propose a cluster based under sampling technique that solves the class imbalance problem for our cardiovascular data. It shows significant better performance than existing methods. Keywords— class imbalance, under-sampling, oversampling, clustering, SMOTE.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cluster-Based Sampling Approaches to Imbalanced Data Distributions

For classification problem, the training data will significantly influence the classification accuracy. When the data set is highly unbalanced, classification algorithms tend to degenerate by assigning all cases to the most common outcome. Hence, it is important to select the suitable training data for classification in the imbalanced class distribution problem. In this paper, we propose cluste...

متن کامل

Semi Supervised Under-sampling: a Solution to the Class Imbalance Problem for Classification and Feature Selection

Most medical datasets are not balanced in their class labels. Furthermore, in some cases it has been noticed that the given class labels do not accurately represent characteristics of the data record. Most existing classification methods tend not to perform well on minority class examples when the dataset is extremely imbalanced. This is because they aim to optimize the overall accuracy without...

متن کامل

Sliding-Mode-based Improved Direct Active and Reactive Power Control of Doubly Fed Induction Generator under Unbalanced Grid Voltage Condition

This paper proposes an improved direct active and reactive power control (DPC) strategy for a grid-connected doubly fed induction generator (DFIG) based wind-turbine system under unbalanced grid voltage condition. The method produces required rotor voltage references based on the sliding mode control (SMC) approach in stationary reference frame, without the requirement of synchronous coordinate...

متن کامل

A Proposal of Evolutionary Prototype Selection for Class Imbalance Problems

Unbalanced data in a classification problem appears when there are many more instances of some classes than others. Several solutions were proposed to solve this problem at data level by undersampling. The aim of this work is to propose evolutionary prototype selection algorithms that tackle the problem of unbalanced data by using a new fitness function. The results obtained show that a balanci...

متن کامل

Fuzzy-Rough Nearest-Neighbor Classification Approach

This paper proposes a new --rough nearest-neighbor (NN ) approach based on the fuzzy-rough sets theory. This approach is more suitable to be used under partially exposed and unbalanced data set compared with crisp NN and fuzzy NN approach. Then the new method is applied to China listed company financial distress prediction, a typical classification task under partially exposed and unbalanced le...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013